A Method for Providing Access to 64-Bit Files in 32-bit Operating Systems
This paper describes Programmed Logic Corporation's (PLC's) approach to providing access to 64-bit files in 32-bit operating systems. We present the application programming interface (API) for accessing 64-bit files, along with some discussion explaining why we chose these interfaces. It is assumed that the reader is familiar with the issues involved with providing support for 64-bit files, such as those discussed in [1].
The product that allows us to provide 64-bit file access in 32-bit operating systems is called TeraFile. It is an add-on software product that provides the libraries, some 64-bit-aware commands, and the mechanism to access 64-bit files. We do not rely on changes to the compilation system to support 64-bit integral types such as longlong, though we take advantage of them when they are available. Nor do we depend on the operating system to provide any support for 64-bit files.
The underlying technology that allows TeraFile to support 64-bit files is a file system called StackFS (for PLC's "Stackable File System"). StackFS, when combined with a 64-bit personality module, presents the illusion of a 64-bit file to an application, hiding the actual accesses to 32-bit files that act as different portions of a larger, 64-bit file. Thus, StackFS and the 64-bit module take multiple 32-bit files and make them appear as if they were one large file.
The StackFS/64-bit combination is basically a multiplexor at the file system level. It multiplexes I/O requests from applications to possibly multiple files that exist on other file systems. It is similar in concept to a logical volume manager, but instead of dealing with disk partitions, it deals with files.
Figure 1 illustrates how a 64-bit file can be created from multiple 32-bit files. Each 32-bit file represents a different range of offsets into the 64-bit file. Up to offset x, I/O requests are routed to the first file. I/O requests pertaining to offsets between x and y are routed to the second file. The third file is used to satisfy I/O requests between offsets y and z. There is no limit to the number of 32-bit files that can be used to represent a 64-bit file.
Using regular, every-day, 32-bit files as portions of a larger 64-bit file means that the size of a 64-bit file is not limited to the maximum size of a file system. The constituent 32-bit files can exist on separate file systems. They can even exist in different machines when TeraFile is used with a network file system.
TeraFile provides versions of system calls that require 64-bit offsets. These system calls include lseek, truncate, ftruncate, stat, lstat, fstat, mmap, and lockf. (lockf is actually a library routine built on top of the file and record locking fcntl commands.) The 64-bit versions of these interfaces are patterned after those interfaces discussed in [2], but are implemented using the ioctl system call. Since the file system always has the first crack at processing an ioctl call on any file controlled by that file system, ioctl commands can be applied to regular files, in contrast to the traditional case where they can only be applied to character special files.
Figure 1. 64-bit File Support Using 32-bit Files
The overall approach we took in designing the 64-bit API was driven by two requirements: we couldn't disrupt existing interfaces or data types because we were building an add-on product, and we couldn't produce a 64-bit version of every interface imaginable. Most customers that we've talked to about supporting 64-bit files only needed very large files in special circumstances, such as storing digitized video, or managing a large database. In these cases, it is unnecessary to provide anything but a handful of interfaces, because the large files were to be accessed by a small set of applications that our customers controlled.
Instead of hiding the 64-bit data type in an existing data type (such as off_t), we chose to provide a separate type that explicitly calls attention to the fact that the data type occupies 64 bits. This allows applications that wish to be 64-bit-aware to take advantage of the interfaces, while all other (32-bit) applications remain unchanged. Our 64-bit data type, shown in Figure 2, is treated as an unsigned 64-bit integer.
Figure 2. 64-bit Data Type
On systems where the C compiler does not support an integral 64-bit data type, we made our 64-bit data type a structure. Accordingly, we provide several library routines to perform simple arithmetic using 64-bit integers, a library routine to compare two 64-bit integers, a library routine to print 64-bit numbers, and a library routine to convert the ASCII representation of a 64-bit number to a 64-bit integer. While this may seem unwieldy at first, note that the uint64_t could later be replaced by an integral 64-bit data type supported by the C compiler without affecting either the source code or binary code.
The lseek64 interface, shown in Figure 3, is the same as the lseek interface, except that the offset and return value are 64-bit quantities. We chose to return a 64-bit quantity instead of passing the offset by reference (thus making it an in-out parameter), because even though we were building interfaces for 32-bit applications, we thought it was more important to provide a migration path from 32-bit applications to 64-bit applications. In this case, the natural implementation in a 64-bit operating system is to pass the 64-bit offset by value and return the 64-bit offset, maintaining consistency with the 32-bit lseek interface.
off64_t lseek64(int fd, off64_t offset, int whence);
Figure 3. lseek64 Interface
The truncate64 interfaces, shown in Figure 4, are the same as the truncate interfaces, except that the length is a 64-bit quantity.
int truncate64(const char *path, off64_t length);
int ftruncate64(int fd, off64_t length);
Figure 4. truncate64 Interface
The mmap64 interface, shown in Figure 5, is the same as the mmap interface, except that the offset is a 64-bit quantity. Note that the length remains a 32-bit quantity. Since 32-bit operating systems generally provide 32-bit address spaces, applications are restricted to mapping only a small (less than 2 GB) section of a 64-bit file at a time. Removal of this restriction would imply that the operating system and compiler would have to support 64-bit pointers and 64-bit address spaces.
caddr_t mmap64(caddr_t addr, size_t len, int prot, int flags, int fd, off64_t off);
Figure 5. mmap64 Interface
The lockf64 interface, shown in Figure 6, is the same as the lockf interface, except that the length is a 64-bit quantity.
int lockf64(int fd, int func, off64_t length);
Figure 6. lockf64 Interface
Figure 7 shows the stat64 interfaces. The stat64 structure is exactly the same size as the stat structure in SVR4, but the st_size member now occupies 64 bits. In the SVR4 stat structure, a 32-bit integer was reserved next to st_size so that this field could be expanded in the future.
To aid porting of existing applications, we provided users with a header file that, under control of a manifest constant that can be passed to the compiler, transparently maps all the 32-bit system calls to their 64-bit counterparts, performing any necessary type conversions. Our experience has been that, assuming the compiler supports 64-bit integral types and that the source code is strictly typed, existing 32-bit code can be ported to the new 64-bit environment by simply including our new header file and recompiling.
struct stat64 {
dev_t st_dev;
long st_pad1[3];
ino_t st_ino;
mode_t st_mode;
nlink_t st_nlink;
uid_t st_uid;
gid_t st_gid;
dev_t st_rdev;
long st_pad2[2];
uint64_t st_size;
timestruc_t st_atim;
timestruc_t st_mtim;
timestruc_t st_ctim;
long st_blksize;
long st_blocks;
char st_fstype[16];
long st_pad4[8];
};
#ifndef st_atime
#define st_atime st_atim.tv_sec
#define st_mtime st_mtim.tv_sec
#define st_ctime st_ctim.tv_sec
#endif
int lstat64(const char *path, struct stat64 *sbuf);
int stat64(const char *path, struct stat64 *sbuf);
int fstat64(int fd, struct stat64 *sbuf);
Figure 7. stat64 Interfaces
Solving the 64-bit file access problem with TeraFile has several advantages. First, TeraFile provides support for 64-bit files in 32-bit operating systems without requiring that the base operating system be changed. This means that OS vendors are spared the expense of overhauling their operating systems to support 64-bit files. In addition, this means less disruption to customers, since enabling their systems to support 64-bit files involves a simple add-on software installation instead of a complicated OS upgrade.
Next, the 64-bit programming interfaces are provided in addition to the existing 32-bit ones. This means that existing 32-bit applications are not impacted by the modification of interfaces that they are currently using. In addition, 32-bit applications can coexist with 64-bit-aware applications without getting in each other's way.
Since there are really 32-bit files that make up the 64-bit files, existing system software will not be impacted as much. For example, the software that backs up and restores files doesn't have to change to understand 64-bit files. Instead of backing up the 64-bit files, administrators can back up the constituent 32-bit files.
By providing a separate data type to represent 64-bit integers and routines to manage variables of that type, we avoid compiler issues altogether. The 64-bit interfaces have been designed to facilitate the move to an integral 64-bit data type should the OS vendor decide to take that step. This protects the investment in existing software.
TeraFile provides a way to make multiple 32-bit files look like a single 64-bit file. The end result is support for 64-bit files without requiring any base operating system changes or compiler changes. Providing. the 64-bit file API separately from the 32-bit file API allows us to support 64-bit file access without impacting existing applications.
For more information contact:
PROGRAMMED LOGIC CORPORATION
200 Cottontail Lane Somerset, NJ 08873
Phone: 908-302-0090; 1-800-967-0050; Fax: 908-302-1903
http://www.plc.com
Email: info@plc.com; sales@plc.com
This page, and all contents, are Copyright (C) 1995 by
Programmed Logic Corporation,
200 Cottontail Lane, Somerset, N.J. 08873,
U.S.A.